Abstract
The disparity in the languages commonly studied in Natural LanguageProcessing (NLP) is typically reflected by referring to languages as low vshigh-resourced. However, there is limited consensus on what exactly qualifiesas a `low-resource language.' To understand how NLP papers define and study`low resource' languages, we qualitatively analyzed 150 papers from the ACLAnthology and popular speech-processing conferences that mention the keyword`low-resource.' Based on our analysis, we show how several interacting axescontribute to `low-resourcedness' of a language and why that makes it difficultto track progress for each individual language. We hope our work (1) elicitsexplicit definitions of the terminology when it is used in papers and (2)provides grounding for the different axes to consider when connoting a languageas low-resource.